Search CORE

14 research outputs found

Multi-Agent Programming Contest 2011 - The Python-DTU Team

Author: Ettienne Mikko Berggren
Vester Steen
Villadsen Jørgen
Publication venue
Publication date: 01/01/2011
Field of study

We provide a brief description of the Python-DTU system, including the overall design, the tools and the algorithms that we plan to use in the agent contest.Comment: 4 page

arXiv.org e-Print Archive

Online Research Database In Technology

Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing

Author: Bille Philip
Ettienne Mikko Berggren
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in (i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or (ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1 In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search

Dagstuhl Research Online Publication Server

Compressed and efficient algorithms and data structures for strings

Author: Ettienne Mikko Berggren
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

Implementing a Multi-Agent System in Python

Author: Ettienne Mikko Berggren
Vester Steen
Villadsen Jørgen
Publication venue: Technische Universität Clausthal
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

Fast Dynamic Arrays

Author: Bille Philip
Christiansen Anders Roy
Ettienne Mikko Berggren
Gørtz Inge Li
Publication venue
Publication date: 01/01/2017
Field of study

We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of

n

elements supporting access in time

O(1)

and insertion and deletion in time

O(n^\epsilon)

for

\epsilon > 0

while using

o(n)

extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and multiset from the standard library on sequences with up to

10^8

elements. Our fastest implementation uses much less space than multiset while providing speedups of

40\times

for access operations compared to multiset and speedups of

10.000\times

compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations

arXiv.org e-Print Archive

Online Research Database In Technology

Fast Dynamic Arrays

Author: Bille Philip
Christiansen Anders Roy
Ettienne Mikko Berggren
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

We present a highly optimized implementation of tiered vectors, a data structure for maintaining a sequence of n elements supporting access in time O(1) and insertion and deletion in time O(n^e) for e > 0 while using o(n) extra space. We consider several different implementation optimizations in C++ and compare their performance to that of vector and set from the standard library on sequences with up to 10^8 elements. Our fastest implementation uses much less space than set while providing speedups of 40x for access operations compared to set and speedups of 10.000x compared to vector for insertion and deletion operations while being competitive with both data structures for all other operations

Dagstuhl Research Online Publication Server

Compressed Indexing with Signature Grammars

Author: Christiansen Anders Roy
Ettienne Mikko Berggren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The compressed indexing problem is to preprocess a string

S

of length

n

into a compressed representation that supports pattern matching queries. That is, given a string

P

of length

m

report all occurrences of

P

S

. We present a data structure that supports pattern matching queries in

O(m + occ (\lg\lg n + \lg^\epsilon z))

time using

O(z \lg(n / z))

space where

z

is the size of the LZ77 parse of

S

and

\epsilon > 0

is an arbitrarily small constant, when the alphabet is small or

z = O(n^{1 - \delta})

for any constant

\delta > 0

. We also present two data structures for the general case; one where the space is increased by

O(z\lg\lg z)

, and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if

P

occurs in

S

O(m)

time using

O(z\lg(n/z))

space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Time-space trade-offs for lempel-ziv compressed indexing

Author: Bille Philip
Ettienne Mikko Berggren
Gørtz Inge Li
Vildhøj Hjalte Wedel
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Given a string

S

, the \emph{compressed indexing problem} is to preprocess

S

into a compressed representation that supports fast \emph{substring queries}. The goal is to use little space relative to the compressed size of

S

while supporting fast queries. We present a compressed index based on the Lempel--Ziv 1977 compression scheme. We obtain the following time-space trade-offs: For constant-sized alphabets; (i)

O(m + occ \lg\lg n)

time using

O(z\lg(n/z)\lg\lg z)

space, or (ii)

O(m(1 + \frac{\lg^\epsilon z}{\lg(n/z)}) + occ(\lg\lg n + \lg^\epsilon z))

time using

O(z\lg(n/z))

space. For integer alphabets polynomially bounded by

n

; (iii)

O(m(1 + \frac{\lg^\epsilon z}{\lg(n/z)}) + occ(\lg\lg n + \lg^\epsilon z))

time using

O(z(\lg(n/z) + \lg\lg z))

space, or (iv)

O(m + occ(\lg\lg n + \lg^{\epsilon} z))

time using

O(z(\lg(n/z) + \lg^{\epsilon} z))

space, where

n

and

m

are the length of the input string and query string respectively,

z

is the number of phrases in the LZ77 parse of the input string,

occ

is the number of occurrences of the query in the input and

\epsilon > 0

is an arbitrarily small constant. In particular, (i) improves the leading term in the query time of the previous best solution from

O(m\lg m)

O(m)

at the cost of increasing the space by a factor

\lg \lg z

. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of

O(m(1+\frac{\lg^{\epsilon} z}{\lg (n/z)}))

. However, for any polynomial compression ratio, i.e.,

z = O(n^{1-\delta})

, for constant

\delta > 0

, this becomes

O(m)

. Our index also supports extraction of any substring of length

\ell

O(\ell + \lg(n/z))

time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Optimal-Time Dictionary-Compressed Indexes

Author: Christiansen Anders Roy
Ettienne Mikko Berggren
Kociumaka Tomasz
Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 04/09/2019
Field of study

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings, including \emph{string attractors} --- new combinatorial objects encompassing most known compressibility measures for highly repetitive texts ---, and grammars based on \emph{locally-consistent parsing}. More in detail, let

\gamma

be the size of the smallest attractor for a text

T

of length

n

. The measure

\gamma

is an (asymptotic) lower bound to the size of dictionary compressors based on Lempel--Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space

O(\gamma\log(n/\gamma))

, and our lightest indexes work within the same asymptotic space. Let

\epsilon>0

be a suitably small constant fixed at construction time,

m

be the pattern length, and

occ

be the number of its text occurrences. Our index counts pattern occurrences in

O(m+\log^{2+\epsilon}n)

time, and locates them in

O(m+(occ+1)\log^\epsilon n)

time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within

O((m+occ)\,\textrm{polylog}\,n)

time. Further, by increasing the space to

O(\gamma\log(n/\gamma)\log^\epsilon n)

, we reduce the locating time to the optimal

O(m+occ)

, and within

O(\gamma\log(n/\gamma)\log n)

space we can also count in optimal

O(m)

time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in

O(n)

space and

O(n\log n)

expected time. As a byproduct of independent interest..

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Online Research Database In Technology

Decompressing Lempel-Ziv Compressed Text

Author: Bille Philip
Ettienne Mikko Berggren
Gagie Travis
Gørtz Inge Li
Prezza Nicola
Publication venue
Publication date: 04/11/2019
Field of study

We consider the problem of decompressing the Lempel--Ziv 77 representation of a string

S

of length

n

using a working space as close as possible to the size

z

of the input. The folklore solution for the problem runs in

O(n)

time but requires random access to the whole decompressed text. Another folklore solution is to convert LZ77 into a grammar of size

O(z\log(n/z))

and then stream

S

in linear time. In this paper, we show that

O(n)

time and

O(z)

working space can be achieved for constant-size alphabets. On general alphabets of size

\sigma

, we describe (i) a trade-off achieving

O(n\log^\delta \sigma)

time and

O(z\log^{1-\delta}\sigma)

space for any

0\leq \delta\leq 1

, and (ii) a solution achieving

O(n)

time and

O(z\log\log (n/z))

space. The latter solution, in particular, dominates both folklore algorithms for the problem. Our solutions can, more generally, extract any specified subsequence of

S

with little overheads on top of the linear running time and working space. As an immediate corollary, we show that our techniques yield improved results for pattern matching problems on LZ77-compressed text

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology